googleAuthRsearchConsoleRgoogleAuthRgoogleAnalyticsRgoogleComputeEngineR (cloudyr)bigQueryR (cloudyr)googleCloudStorageR (cloudyr)googleLanguageR (rOpenSci)Slack group to talk around the packages #googleAuthRverse
https://www.rocker-project.org/
Maintain useful R images
rocker/r-verrocker/rstudiorocker/tidyverserocker/shinyrocker/ml-gpuFROM rocker/tidyverse:3.6.0
MAINTAINER Mark Edmondson (r@sunholo.com)
# install R package dependencies
RUN apt-get update && apt-get install -y \
libssl-dev
## Install packages from CRAN
RUN install2.r --error \
-r 'http://cran.rstudio.com' \
googleAuthR \
googleComputeEngineR \
googleAnalyticsR \
searchConsoleR \
googleCloudStorageR \
bigQueryR \
## install Github packages
&& installGithub.r MarkEdmondson1234/youtubeAnalyticsR \
## clean up
&& rm -rf /tmp/downloaded_packages/ /tmp/*.rds \Flexible No need to ask IT to install R places, use docker run; across cloud platforms; ascendent tech
Version controlled No worries new package releases will break code
Scalable Run multiple Docker containers at once, fits into event-driven, stateless serverless future
Continuous development with GitHub pushes
Good for one-off workloads
Pros
Probably run the same code with no changes needed
Easy to setup
Cons
Expensive
May be better to have data in database
3.75TB of RAM: $423 a day (compare ~$1 a day for standard tier VM)
library(googleComputeEngineR)
# this will cost a lot
bigmem <- gce_vm("big-mem",
template = "rstudio",
predefined_type = "n1-ultramem-160")library(googleComputeEngineR)
# your customised Docker image built via Build Triggers
custom_image <- gce_tag_container("custom-shiny-app",
"your-project")
## make new Shiny template VM for your self-contained Shiny app
vm <- gce_vm("myapp",
template = "shiny",
predefined_type = "n1-standard-2",
dynamic_image = custom_image)googleCloudStorageR or bigQueryRGood for parallelisable or scheduled data tasks
Pros
Fault redundency
Forces repeatable/reproducable infrastructure
library(future) makes parallel processing very useable
Cons
Changes to your code for split-map-reduce
Write meta code to handle I/O data and code
Not applicable to some problems
New in googleComputeEngineR v0.3 - shortcut that launches cluster, checks authentication for you
library(googleComputeEngineR)
vms <- gce_vm_cluster()
#2019-03-29 23:24:54> # Creating cluster with these arguments:template = r-base,dynamic_image = rocker/r-parallel,wait =
#FALSE,predefined_type = n1-standard-1
#2019-03-29 23:25:10> Operation running...
...
#2019-03-29 23:25:25> r-cluster-1 VM running
#2019-03-29 23:25:27> r-cluster-2 VM running
#2019-03-29 23:25:29> r-cluster-3 VM running
...
#2019-03-29 23:25:53> # Testing cluster:
r-cluster-1 ssh working
r-cluster-2 ssh working
r-cluster-3 ssh workinggoogleComputeEngineR has custom method for future::as.cluster
## make a future cluster
library(future)
library(googleComputeEngineR)
vms <- gce_vm_cluster()
plan(cluster, workers = as.cluster(vms))
...do parallel...# create cluster
vms <- gce_vm_cluster("r-vm", cluster_size = 3)
plan(cluster, workers = as.cluster(vms))
# get data
my_files <- list.files("myfolder")
my_data <- lapply(my_files, read.csv)
# forecast data in cluster
library(forecast)
cluster_f <- function(my_data, args = 4){
forecast(auto.arima(ts(my_data, frequency = args)))
}
result <- future_lapply(my_data, cluster_f, args = 4) Can multi-layer future loops (use each CPU within each VM)
Thanks for Grant McDermott for figuring optimal method (Issue #129)
future_sim <-
## Outer future_lapply() call loops over the no. of VMS
future_lapply(1:length(vms), FUN = function(x) {
## Inner future_lapply() call loops over desired no. of iterations / no. of VMs
future_lapply(1:(iters/length(vms)), FUN = slow_func)
})3 VMs, 8 CPUs each = 24 threads (~$3 a day)
Clusters of VMs + Docker = Horizontal scaling
Clusters of VMs + Docker + Task controller = Kubernetes
Good for Shiny / R APIs
Pros
Auto-scaling, task queues etc.
Scale to billions
Potentially cheaper
May already have cluster in your organisation
Cons
Needs stateless, idempotent workflows
Message broker?
Minimum 3 VMs
Built on Cloud Build upon GitHub push
FROM rocker/shiny
MAINTAINER Mark Edmondson (r@sunholo.com)
# install R package dependencies
RUN apt-get update && apt-get install -y \
libssl-dev
## Install packages from CRAN needed for your app
RUN install2.r --error \
-r 'http://cran.rstudio.com' \
googleAuthR \
googleAnalyticsR
## assume shiny app is in build folder /shiny
COPY ./shiny/ /srv/shiny-server/myapp/kubectl run shiny1 \
--image gcr.io/gcer-public/shiny-googleauthrdemo:prod \
--port 3838
kubectl expose deployment shiny1 \
--target-port=3838 --type=NodePortBuilt on Cloud Buid every GitHub push:
FROM trestletech/plumber
# copy your plumbed R script
COPY api.R /api.R
# default is to run the plumbed script
CMD ["api.R"]kubectl run my-plumber \
--image gcr.io/your-project/my-plumber:prod \
--port 8000
kubectl expose deployment my-plumber \
--target-port=8000 --type=NodePortapiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: r-ingress-nginx
spec:
rules:
- http:
paths:
- path: /gar/
# app deployed to /gar/shiny/
backend:
serviceName: shiny1
servicePort: 3838curl 'http://mydomain.com/api/echo?msg="its alive!"'
#> "The message is: its alive!"A 40 mins talk at Google Next19 with lots of new things to try!
https://www.youtube.com/watch?v=XpNVixSN-Mg&feature=youtu.be
Great video that goes more into Spark clusters, Jupyter notebooks, training using ML Engine and scaling using Seldon on Kubernetes that I haven’t tried yet